Locality-Sensitive Hashing Without False Negatives for l_p

نویسندگان

  • Andrzej Pacuk
  • Piotr Sankowski
  • Karol Wegrzycki
  • Piotr Wygocki
چکیده

In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with lp norm when p ∈ [1,∞]. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance R, we will certainly report it and points at distance greater than cR will not be reported for c = Ω( √ d, d 1− 1 p ). The constructed algorithms work: • with preprocessing time O(n log(n)) and sublinear expected query time, • with preprocessing time O(poly(n)) and expected query time O(log(n)). Our paper reports progress on answering the open problem presented by Pagh [8], who considered the nearest neighbor search without false negatives for the Hamming distance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...

متن کامل

Locality-sensitive Hashing without False Negatives

We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c > 1, comes close to that of the best possible data i...

متن کامل

On fast bounded locality sensitive hashing

In this paper, we examine the hash functions expressed as scalar products, i.e., f(x) =< v, x >, for some bounded random vector v. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of v. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of P [| < v, x > | < α]. In many applications, v is...

متن کامل

Fast indexing strategies for robust image hashes

Similarity preserving hashing can aid forensic investigations by providing means to recognize known content and modified versions of known content. However, this raises the need for efficient indexing strategies which support the similarity search. We present and evaluate two indexing strategies for robust image hashes created by the ForBild tool. These strategies are based on generic indexing ...

متن کامل

Hyperplane Arrangements and Locality-Sensitive Hashing with Lift

Locality-sensitive hashing converts high-dimensional feature vectors, such as image and speech, into bit arrays and allows high-speed similarity calculation with the Hamming distance. There is a hashing scheme that maps feature vectors to bit arrays depending on the signs of the inner products between feature vectors and the normal vectors of hyperplanes placed in the feature space. This hashin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016